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PREFACE 


This  report  presents  the  results  of  a  study  that  started  at  Leiden  University,  The 
Netherlands.  The  goal  of  the  study  was  to  develop  a  quantitative  approach  to  test  the  ef¬ 
ficacy  of  mental  models  that  humans  use  to  interact  with  artifices  such  as  ATMs  and 
computer  programs.  As  a  matter  of  good  fortune  the  particular  mental  model  studied  ap¬ 
peared  to  have  great  relevance  for  the  increased  USAF  interest  in  UAV  technology.  We 
hope  the  approach  will  be  applied  fixiitfully  in  this  context. 

The  major  part  of  this  work  was  performed  while  the  first  author  was  a  Senior 
Research  Associate  with  the  National  Research  Council  at  the  Armstrong  Laboratory. 

The  authors  are  indebted  to  Dr.  Richard  Roberts  and  Dr.  Anna  Rowe  for  many 
suggestions  that  improved  the  qiiality  of  the  report. 


SUMMARY  OF  RESULTS 


1.  Holland,  Holyoak,  Nisbett,  and  Thagard  (1986)  propose  a  formalism  to  capture  the 
dynamics  of  mental  models:  a  transition  function  defined  on  a  set  of  model  states,  the 
result  of  a  categorizing  of  environmental  states.  This  transition  function  mimics  the 
state  changes  that  unfold  in  the  environment.  The  paper  shows  that  the  addition  of  a 
few  reasonable  constraints  to  this  formalism  results  in  a  class  of  transition  functions 
with  well-known  properties  —  the  general  class  of  finite-state  machines. 

2.  It  is  shown  that  finite-state  machines  can  be  empirically  tested  by  contigency  tables  in 
which  symbols  of  an  input  alphabet  (rows)  are  mapped  on  symbols  of  an  output  al¬ 
phabet  (columns)  and  each  state  is  represented  as  a  different  layer  of  cross¬ 
classification.  Model  testing  of  Probabilistic  Finite  Automata  can  be  straightfor¬ 
wardly  accomplished  using  chi-square  based  statistics. 

3.  Empirical  evaluation  of  Deterministic  Finite  Automata  can  be  accomplished  by  ap¬ 
plying  techniques  derived  from  Information  Theory.  Information  Theory  defines  a 
Deterministic  Finite  Automaton  as  a  perfect  channel.  That  is,  the  information  trans¬ 
mitted  is  equal  to  the  maximum  imcertainty.  Model  deviations  can  be  quantified  as 
information  loss  (i.e.,  the  difference  between  maximum  imcertainty  and  information 
transmitted). 


VI 


INTRODUCTION 


Whenever  someone  has  acquired  the  cognitive  skill  to  use  some  interactive  device 
(e.g,,  a  fighter  jet,  a  car,  an  ATM,  or  a  word  processing  system),  he  or  she  is  assumed  to 
have  a  cognitive  representation  of  it  that  is  like  a  working  model.  Cognitive  scientists 
have  vised  the  term  ‘mental  model’  in  these  contexts  to  refer  to  a  theoretical  construct  that 
interrelates  conceptual  knowledge  with  procedural  skills.  The  common  core  of  many 
theoretical  treatments  is  the  notion  that  cognitive  systems  construct  models  of  a  particular 
content  domain.  These  models  can  be  mentally  ‘nm’,  or  manipulated  to  produce  infer¬ 
ences,  explanations,  and  predictions  about  the  system  (Holland,  Holyoak,  Nisbett,  & 
Thagard,  1986;  Holyoak,  1985;  Holyoak,  Koh,  &  Nisbett,  1989;  Johnson-Laird,  1983, 
1989;  Payne,  1988;  Rogers  &  Rutherford,  1992). 

Holland  et  al.  (1986)  propose  a  transition  function  as  a  formalism  to  capture  the 
dynamics  of  mental  models.  They  propose  to  conceive  of  a  mental  model  as  a  mental 
representation  that  encodes  a  particular  environment  into  categories  and  subsequently 
employs  such  categories  to  define  an  internal  transition  function  that  mimics  the  state 
changes  unfolding  in  this  environment,  A  mental  model  is  considered  valid  to  the  extent 
that  the  relationship  between  the  mental  model  and  the  corresponding  part  of  the  envi¬ 
ronment  is  a  homomorphism,  that  is,  a  many-to-one  structure  mapping  of  states,  and  state 
change  operators,  from  the  external  environment  to  the  mental  model  (Holyoak,  1985).* 
This  abstract  characterization  of  mental  models  is  neutral  concerning  the  issue  of  the  in¬ 
formation  processing  mechanisms  that  may  be  employed  to  construct  mental  models. 
However,  Holland  et  al.  (1986)  argue  that  mental  models  are  assembled  from  sets  of  pro¬ 
duction  rules.  The  empirical  credibility  of  their  theory  is  tested  through  comparison  of 
the  performance  of  humans  or  lower  animals  with  the  operation  of  particular  production 
systems  (e.g.,  Holyoak  et  al.,  1989). 

In  this  paper  we  investigate  how  the  abstract  characterization  of  mental  models 
postulated  by  Holland  et  al.  (1986)  can  be  tested  without  simultaneously  being  con- 
fovmded  with  a  particular  cognitive  architecture.  Students  of  mental  models  are  primarily 
interested  in  the  behavioral  effects  of  mental  models  as  knowledge  representations,  rather 
than  in  grand  theories  of  cognitive  architecture  (e.g.,  Payne,  1992).  The  focus  of  this  in¬ 
vestigation  will  be  those  mental  models  that  govern  the  interaction  with  relatively  com¬ 
plex  devices  (e.g.,  such  as  studied  in  the  field  of  human-computer  interaction).  These 
models  are  often  referred  to  as  “user  models”  (Norman,  1983,  1986). 

In  the  first  part  of  the  paper  a  formal  definition  of  mental  models  similar  to  that  of 
Holland  et  al.  (1986)  is  presented.  It  will  be  shown  that  the  addition  of  a  few  reasonable 
constraints  to  the  formalism  proposed  by  Holland  et  al.  (1986)  results  in  a  class  of  transi¬ 
tion  functions  with  well-known  properties  -  the  general  class  of  finite-state  machines 
(Davis,  Sigal,  &  Weyuker,  1994;  Denning,  Dennis,  &  Qualitz,  1978;  Kolman  &  Busby, 
1987;  Minsky,  1967).  Modeling  mental  models  as  finite-state  machines  has  several  ad¬ 
vantages.  First,  the  finite-state  formalism  provides  systematic  ways  to  achieve  a  minimal 


'  Or  rather,  a  quasi-homomorphism  (i.e.,  q-morphism),  to  allow  for  the  fact  that 
most  mental  representations  in  real  life  will  be  imperfect. 
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form  of  a  machine  that  accoimts  for  a  given  set  of  input-output  mappings  (Denning  et  al.. 
1978).  Thus  securing  a  fully  parsimonious  account  of  any  particular  mental  model  (un¬ 
like  the  production  rules  system  approach).  Second,  because  the  theory  of  finite-state 
machines  is  intertwined  with  the  theory  of  abstract  grammars,  the  likelihood  of  a  given 
sequence  of  input  actions  performed  by  a  user  (conceived  of  as  an  input  alphabet)  can  be 
estimated  under  the  hypothesis  of  a  specific  finite-state  machine.  In  this  paper  we  present 
a  method  to  empirically  test  mental  models  conceptualized  as  finite-state  machines. 

Part  two  of  the  paper  presents  detailed  calculation  methods  to  demonstrate  certain 
insights  that  may  be  gained  from  this  approach.  Data  from  a  spatial  reasoning  experi¬ 
ment  will  be  used.  In  this  experiment  second  grade  and  third  grade  children  were  asked 
to  move  an  object  from  an  initial  state  to  a  goal  state  on  the  basis  of  a  schematic  diagram 
of  a  spatial  structure.  There  were  two  versions  of  the  task.  One  version  resembles  a  city 
plan  with  recognizable  markers  (i.e.,  shops  and  churches)  and  a  recognizable  object  (i.e., 
a  model  of  a  bus).  The  other  version  had  abstract  markers  (such  as  triangles  and  squares), 
and  a  more  abstract  object  to  be  moved  (i.e.,  a  pawn).  In  this  paper  only  data  from  the 
version  employing  the  bus  model  is  discussed. 


PARTI 


A  formal  definition  of  mental  models 

The  Holland  et  al.  (1986)  conception  of  mental  models  consists  of  two  elements.  First,  a 
categorization  function  that  categorizes  environmental  states  and,  second,  a  transition 
function  defined  on  these  categories  of  environmental  states  which  mimics  the  state 
changes  in  the  environment  to  be  predicted.  For  ease  of  exposition  the  elements  are  dis¬ 
cussed  in  their  reverse  order. 


The  transition  function 


A  finite-state  function 

Following  Holland  et  al.  (1986)  we  characterize  a  mental  model  M  as  a  transition  func¬ 
tion  (or  a  next-state  function)  8  defined  over  a  set  of  states  Q  and  a  set  of  inputs  I? 

d-.QxI^Q  (1) 

The  domain  of  the  transition  function  5  is  the  set  of  all  state-input  pairs,  and  its  range  is  a 
subset  of  states.  The  transition  function  in  Holland  et  al.  (1986)  and  Holyoak  (1985)  does 
not  provide  a  model  in  the  strictest  sense.  Rather,  it  provides  a  general  metaphor  for 


^  Actually,  we  follow  a  notation  more  similar  to  the  notation  presented  in  Holyoak 
(1985),  which  is  clearer  than  Holland  et  al.’s  (1986)  notation. 
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theorizing  about  changes  in  an  environment.  It  applies  equally  well  equally  to  a  very  ‘lo¬ 
cal  environment’  (e.g.,  a  chessboard,  pieces  and  players)  as  to  the  whole  of  the  universe 
(Holland  et  al.  1986,  p.  30).  The  input  or  action  term  of  the  transition  function  can  repre¬ 
sent  (discrete)  actions  of  human  participants,  as  well  as  (continuous)  autonomous  effects 
such  as  those  caused  by  the  working  of  the  laws  of  nature  (e.g.,  all  fast-moving  objects 
slow  down  (Holland  et  al.  1986,  p.  31)). 

In  this  paper  a  special  class  of  transition  functions  is  investigated.  These  func¬ 
tions  may  be  derived  by  constraining  the  general  model  further  and  by  then  assuming  the 
following  properties  of  M,  Q,  /,  and  8 

•  The  behavior  of  M  is  defined  only  at  the  moments  r  =  0, 1 , 2, ... 

•  The  states  qt  are  chosen  from  a  finite  set  of  states  Q  (i.e.,  the  set  of  model  states). 

•  The  input  symbols  st  are  chosen  from  a  finite  alphabet  /  (i.e.,  the  input  alphabet). 

•  The  output  symbols  Of  are  chosen  from  a  finite  alphabet  O  (i.e.,  the  output  alphabet). 

•  The  behavior  of  M  is  uniquely  determined  by  the  sequence  of  input  symbols  that  are 
presented. 

Now,  M  is  constrained  to  describe  discrete  phenomena.  The  occurrence  of  discrete  phe¬ 
nomena  may  be  represented  as  a  sequence  of  events,  in  which  any  event  is  a  ‘next-to’ 
event,  and  some  events  may  be  ‘initial’,  whilst  others  may  be  ‘terminal’  events.  It  is 
convenient  to  think  of  system  M  as  a  machine  that  can  accept  input,  possibly  produce 
output,  and  have  some  sort  of  internal  memory  that  can  keep  track  of  certain  information 
about  these  previous  inputs.  It  is  assumed  that  M’s  memory  for  past  events  is  of  a  fixed, 
finite  size.  As  a  consequence,  M  can  only  distinguish  between  some  finite  number  of 
classes  of  possible  event  sequences.  These  classes  will  be  called  the  states  of  the  ma¬ 
chine.  Two  additional  assumptions  determine  the  finitude  of  M:  the  input  and  ouq)ut  pa¬ 
rameters  of  system  M  can  only  assume  a  finite  number  of  distinct  values.  By  convention 
the  sets  of  values  which  these  parameters  can  assume  are  called  the  input  alphabet  {I)  and 
output  alphabet  (O),  respectively.  Each  element  in  /  and  in  O  is  called  a  symbol.  Further, 
it  is  assumed  that  M  works  at  discrete  intervals  of  time.  At  each  time  M  is  in  one  of  these 
states,  say  qt-  The  state  at  the  next  time  interval  only  depends  on  the  previous  state 
qt  and  the  input  st  given  at  time  t. 

Together,  these  properties  characterize  the  triple  M  =  {Q,  /,  5)  as  a  finite  automa¬ 
ton  (Davis  et  al,  1994;  Denning  et  al,  1978;  Kolman  &  Busby,  1987;  Minsky,  1967).  The 
transition  function  8,  which  maps  Q  y.  I  on  Q,  consists  of  a  finite  set  of  productions. 
Formally,  productions  that  map  the  current  state  and  input  signal  onto  the  next  state  are 
designated  8  {qt ,  Sf)  ->  qt+1-  The  states  q  e  Q  refer  to  states  of  model  M.  In  order  to 
produce  observable  output,  M  must  encompass  a  fimction  to  relate  the  various  model 
states,  or  state  transitions,  to  its  output. 

Let  X  be  an  output  fimction  X,  which  maps  Q  x  I  on  O.  The  output  fimction  X 
consists  of  a  finite  set  of  productions  X  {qt,  Sf)  ->  0/+7,  which  map  the  current  state  and 
input  signal  onto  the  next  oufriut.  The  output  fimction  X  qualifies  M  as  a  so-called  Mealy 
machine  and  is  usually  referred  to  as  a  sixtuple,  M  =  {Q,  /,  O,  8,  X,  qj),  where  Q,  /,  O,  8, 
and  X  are  as  defined  previously,  and  g/ designates  the  initial  state  of  M  (i.e.,  qj  e  Q).  A 
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Mealy  machine  is  a  special  type  of  automaton  where  the  output  is  associated  with  the 
transition  between  states.  Thus,  A,  {qt,  si)  ->  o/+ 7  gives  an  output  which  shows  the  tran¬ 
sition  from  state  qt  on  input  Sf.  The  output  of  a  Mealy  machine  M  in  response  to  a  se¬ 
quence  of  input  symbols  s],  S2, ...  %  is  X  (gj,  s]),  X  (g2,  S2),  ...,  X  (qn,  j^),  where  02,  03, 
On+1  is  the  sequence  of  ouqjuts  produced  in  parallel  to  the  state  sequence  q2,  q3, 

To  further  characterize  finite-state  machines,  note  that  a  special  case  of  a  finite- 
state  machine  arises  when 

b(q,s)  =  b(s)  (2) 

Such  a  machine  is  called  a  trivial  machine.  A  trivial  machine  determines  a  fixed  function 
between  input  and  output.  Finite-state  machines  belong  to  the  group  of  computational 
systems  that  can  compute  various  functions.  Thus,  8  (q,  b  (s),  with  q  providing  a 
specification  of  the  function  to  be  computed.  It  is  preferable  to  consider  q  a  system  pa¬ 
rameter,  rather  then  a  second  argument  of  the  function.  The  specific  nature  of  the  state 
set  of  a  finite-state  machine  will  be  further  explicated  in  the  next  section. 

Deterministic  versus  Probabilistic  Automata 

We  will  now  introduce  a  distinction  that  has  significant  implications  for  the  empirical  test 
of  mental  representations  modeled  as  finite-state  machines.  These  implications  will  be 
elaborated  upon  in  a  later  section. 

The  finite  automaton  discussed  so  far  is  strictly  deterministic  in  its  actions:  at  each 
moment  the  next  state  and  the  output  symbol  are  uniquely  determined  by  the  present  state 
and  input  symbol.  A  deterministic  finite  automaton  (DFA)  can  be  considered  as  a  special 
case  of  a  probabilistic  finite  automaton  (PFA),  where  Sj  and  5  (qt,  si)  consists  of  one 
state,  and  X  (qt,  si)  consists  of  one  output  symbol.  In  a  PFA  the  productions  of  the  DFA 
state  transition  function  8  (qt,  si)  are  replaced  by  productions  of  the  form  8  (qt,  si) 

{^r+i  1 ...,  qui.n}-  Thus,  given  a  present  state  qj  and  an  input  symbol  sj  various  states  q 
e  Q  can  be  the  next  state.  To  each  of  the  n  possible  transitions  a  probability  Pj  (qt,  sj)  is 
assigned.  Associated  with  each  state  qt  and  input  symbol  Sj  is  a  stochastic  (column) 
vector/(^/,  Sj)  of  transition  probabilities  (i.e.,  an  n-dimensiond  vector  with  non-negative 
components,  the  sum  of  which  equals  1). 

In  a  similar  way  the  productions  of  the  output  function  X  (qt,  si)  ot^^  are  written 
as  X  (qt,  si)  ...,  o/+,,  m}-  Thus  given  a  present  state  qj  and  input  symbol  sj 

^  An  alternative  way  to  assign  an  output  is  by  a  function  X'  =  Q  ->  O.  An  automa¬ 
ton  with  a  A,'-type  output  function  assigns  an  output  symbol  to  each  state.  This  type  of 
machine  is  known  in  the  literature  as  a  Moore  machine.  A  prototypical  example  of  a 
Moore  machine,  or  recognition  machine,  is  a  parity  checker,  that  is,  a  machine  that  indi¬ 
cates  by  its  output  whether  the  parity  of  a  sequence  of  input  symbols  in  the  binary  alpha¬ 
bet  (0, 1 }  is  odd  or  even.  Although  this  type  of  automaton  can  be  shown  to  be  formally 
equivalent  to  a  Mealy  machine  (e.g.,  Denning  et  al,  1978)  in  this  paper  we  will  only  deal 
with  Mealy  machines. 
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various  output  symbols  o  e  O  are  possible.  Each  of  the  m  possible  outputs  is  assigned  a 
probability.  The  set  of  ouq)ut  probabilities  is  also  a  stochastic  (row)  vector  9  {qi,  sj). 
Note  that  a  stochastic  (row  or  column)  vector  is  a  coordinate  vector  iff  one  of  its  compo¬ 
nents  equals  1.  Thus,  a  deterministic  Mealy  machine  represents  the  special  case  of  a 
probabilistic  automaton,  where  all  of  the  vectors  /  and  9,  as  well  as  Sj  (the  initial  state 
distribution)  are  coordinate  vectors. 

Probabilistic  finite  automata  may  be  useful  to  model  mental  representations  of 
interpersonal  interactions  or  other  processes  with  a  stochastic  component,  such  as  running 
a  business,  or  planning  interventions  in  a  macro  economic  system.  In  this  paper  we  in¬ 
vestigate  the  assessment  of  mental  models  of  devices  that  are  constructed  such  that  their 
behavior  is  (or  may  be  expected  to  be)  perfectly  predictable.  For  example,  airplanes, 
cars,  ATMs,  application  programs  and  other  high-tech  products  of  our  culture.  Through¬ 
out  the  paper  the  term  finite-state  machines  will  be  used  to  refer  to  deterministic  finite- 
state  machines,  unless  explicitly  stated  otherwise. 

Engineers  and  naive  users 

In  summary,  a  finite-state  machine  can  be  viewed  as  a  general  mathematical  model  of  an 
interactive  system  defined  by  a  finite  number  of  states.  When  it  is  presented  with  an  input 
firom  an  action  performed  by  a  user,  then,  as  a  function  of  this  input  and  its  current  internal 
state,  it  will  respond  by  moving  to  another  of  its  internal  states,  and  produce  an  output  Fi¬ 
nite-state  machines  have  been  used  to  model  devices  ranging  in  complexity  fi:om  simple 
‘flip-flops’,  such  as  light  switches,  to  entire  computers.  Most  engineering  and  scientific 
investigations  use  finite-state  machine  models  to  characterize  a  particular  system  in  order 
to  achieve  effective  control  over  and  predictability  of  its  behavior.  This  enterprise  is  not 
principally  different  fi:om  the  attempts  of  naive  users  to  constmct  mental  models  in  order 
to  gain  control  over  a  device  and  utilize  it  effectively. 


The  categorization  function 

At  any  point  in  time,  a  person  interacting  with  a  computer  or  another  device  will  be  able 
to  observe  a  set  of  n  different  physical  states,  in  which  such  a  system  can  be.  These  states 
characterize  the  system’s  dynamic  behavior  and  we  will  refer  to  these  states  as  system 
states.  It  is  reasonable  to  expect  that  a  person  attempts  to  construct  a  simplified  model  by 
aggregating  the  system  states  into  useful  categories  and  ignoring  details  that  are  irrelevant 
to  the  purpose  of  the  model  (Holland  et  al.  1986). 

Some  m  dimensions  can  describe  each  of  the  system  states.  Each  dimension  k  has 
p]^  values.  A  single  dimension,  or  a  combination  of  individual  dimensions,  can  form  a 
basis  to  partition  the  set  of  system  states  E.  If  the  person  is  able  to  detect  such  a  dimen¬ 
sion  (or  subset  of  dimensions)  and  considers  it  relevant  for  his  or  her  purposes,  then  the 
person  will  use  that  dimension  (or  subset  of  dimensions)  to  aggregate  the  set  of  system 
states  E  into  the  set  of  model  states  Q.  Formally,  this  aggregation  process  can  be  de¬ 
scribed  as  a  mapping  of  system  states  to  model  states.  In  the  literature  this  mapping  has 
been  referred  to  as  a  categorization  function  P  (Holland  et  al.,  1986),  or  an  instantiation 
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function  IF  (Pylyshyn,  1984).  We  will  follow  the  Holland  et  al.  terminology.  Subse¬ 
quently,  some  aspects  of  this  function  will  be  discussed. 

Functionally  equivalent  system  states 

Any  function  necessarily  determines  an  equivalence  relation  on  its  domain,  which  parti¬ 
tions  the  domain  into  equivalence  classes  (see  APPENDIX  1,  section  A).  The  signifi¬ 
cance  of  the  partitioning  of  the  set  of  system  states  E  into  equivalence  classes  or  catego¬ 
ries  is  based  on  the  intensional  definition  of  its  categories.  For  each  system  state  e  e  [e], 
where  [e\  denotes  a  partition  of  E,  an  identical  function  /  7  O  is  defined,  where  /  de¬ 
notes  the  finite  set  of  input  symbols  and  O  the  finite  set  of  output  symbols.  Each  cate¬ 
gory  [e]  e  E  has  a  unique  function  f.  I O.  Each  category  thus  defines  an  equivalence 
class  of  system  states  that  are  indistinguishable  from  a  functional  point  of  view.  In  other 
words,  a  person  who  wants  to  utilize  a  device  effectively  has  to  partition  the  set  of  system 
states  into  mutually  exclusive  categories.  The  partition  should  maximize  what  the  person 
can  predict  about  the  system’s  behavior  in  response  to  his  or  her  actions. 

For  illustrative  purposes,  let  us  consider  the  well-known  children’s  progranuning 
environment  called  the  LOGO  ‘Turtle  World’  (Papert,  1980;  Abelson  &  diSessa,  1980). 
This  microworld  has  been  especially  designed  to  help  children  in  developing  useful  ways 
of  thinking  about  computing.  The  LOGO  'Turtle  World'  provides  a  graphical  interface  in 
which  children  can  explore  the  effects  of  simple  programming  commands  on  the  behavior 
of  a  screen  object,  called  the  ’turtle’.  A  set  of  very  simple  commands  such  as  FORWARD 
and  BACK  (move  the  turtle),  and  LEFT  and  RIGHT  (turn  the  turtle)  is  available  to  effect 
the  state  of  the  turtle  on  the  computer  screen.  Each  series  of  concrete  actions  performed  by 
the  turtle  create  a  graphical  trace  on  the  computer  screen.  In  this  way,  LOGO  programming 
permits  the  turtle  to  draw  regular  polygons  and  other  geometrical  shapes.  Each  system 
state  of  the  turtle  can  be  described  as  a  set  of  physical  dimensions.  For  example,  a  par¬ 
ticular  location  (i.e.,  X-  and  F-coordinates),  orientation,  and  color  of  the  turtle.  Only  one 
of  these  dimensions  is  causally  relevant  for  any  of  the  state  changes  that  may  occur  in  re¬ 
sponse  to  an  input  action  performed  by  the  student.  Unless  the  student  has  discovered 
this  dimension,  he  or  she  will  be  confused  by  what  seems  like  the  unpredictable  behavior 
of  the  turtle.  For  example,  in  the  LOGO  ‘Turtle  World’  the  impact  of  the  commands 
given  to  the  turtle  is  conditional  upon  its  orientation  on  the  screen.  In  the  turtle’s  primary 
position,  that  is,  facing  up,  the  command  FORWARD  100  results  in  a  movement  across 
the  F-axis.  However,  when  the  turtle  is  facing  east,  and  it  is  given  the  command  FOR¬ 
WARD  100,  the  turtle  moves  across  the  Jf-axis.  This  difference  in  the  turtle’s  behavior 
has  been  shown  to  be  a  substantial  source  of  confusion  in  yoxmg  children  (Fay  &  Mayer, 
1987;  Cohen,  1987). 

For  each  orientation  of  the  turtle  a  different  mapping  is  defined  of  input  symbols 
(i.e.,  the  programming  commands  FORWARD,  BACK,  RIGHT  and  LEFT)  to  output 
symbols  (changes  in  location  or  orientation  of  the  turtle),  irrespective  of  its  location 
and/or  color.  Thus,  each  orientation  of  the  turtle  defines  an  equivalent  class  of  system 
states  that  may  differ  in  location  and/or  color,  but  share  the  same  function  relating  pro¬ 
gramming  commands  to  the  turtle’s  behavior. 

In  summary,  the  basic  claim  is  that  the  human  user  of  an  interactive  system  will 
categorize  system  states  such  that  the  prediction  of  the  effect  of  an  input  action  is  maxi- 
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mally  accurate.  Knowledge  of  the  current  state  of  the  machine,  reduces  the  uncertainty 
about  the  effect  of  a  possible  action. 

Encoding  and  decoding 

To  further  ejqilore  the  relationship  between  mental  model  and  the  system  being  modeled, 
we  will  now  investigate  some  characteristics  of  P.  Let  e  eE  and  q  s  Q.  The  function  P: 
E-^  Q  is  then  defined  by 

Pie)  =  q  (3) 

Let  p:  E  ->  E/R  (see  APPENDIX,  section  A)  and  m:  E/R  g  be  functions.  Let  [e]  € 
E/R.  The  function p:E->  E/R  is  then  defined  by 

p  (e)  =  [e]  (4) 

and  the  function  m:  E/R  -^Q  is  then  defined  by 

m  ([e])  =  q  (5) 

The  function  P  can  be  written  as  a  composition  of  p  and  m:  P(e)  =  (p  m)  (e)  =  q,  or 
P(e)  =  m\p  (c)]  =  q.  Since  the  function  nv.  E/R  Q  is  a.  one-to-one  function  (see  AP¬ 
PENDIX  1,  section  B),  the  function  m  is  invertible,  that  is,  its  inverse  is  also  a  func¬ 
tion.  The  function p  :E-^  E/R  is  not  one-to-one.  Thus, p  is  not  invertible.  We  will  call 
the  functions  P,p  and  m  encoding  functions,  because  they  accoimt  for  the  aggregation  of 
the  set  of  system  states  into  the  set  of  model  states.  Their  inverse  functions,  if  defined, 
will  be  referred  to  as  decoding  functions.  Note,  there  is  only  one  decoding  function: 

Now  consider  M  =  {Q,  7, 5),  a  finite-state  machine  with  state  set  Q  =  {qi, ...,  q^}, 
input  set  7,  and  state  transition  functions  6  =  |  e  7}.  For  any  q,q'e  Q  and  s  e  7,  we 

write  6s  (q)  =  q',  that  is,  input  s  takes  state  q  into  state  q The  structure  M  is  assumed  to 
be  a  representation  of  some  behavior  of  a  system  in  the  environment.  Therefore,  the  rep¬ 
resentation  law  (e.g.,  Newell,  1990)  applies,  which  in  its  general  form  states  that  the  es¬ 
sence  of  a  representation  is  to  allow  to  go  from  one  external  situation  to  another  by  a  dif¬ 
ferent  path,  that  is,  by  manipulation  of  a  internal  representation,  rather  than  by  actually 
effecting  the  initial  external  situation  itself.  In  symbols: 

decode  [encode  (7)(encode  (c))]  =  e'  (6) 

where  e  and  e'  are  external  situations  and  T  is  the  external  transformation  and  [encode 
(7)]  maps  T  onto  symbol  s  &  I,  [encode  (e)]  maps  e  onto  q  e  Q,  and  [decode  {q)]  maps  q' 
onto  e' (after  Newell,  1990,  p.  59).  Since  P  is  a  many-to-one  function  the  predictive 
power  of  structure  M  is  somewhat  limited  by  the  specificity  of  the  partition.  Note  that  P 
has  not  an  inverse  that  is  also  a  function.  Thus,  we  write  P  as  a  composition  of  p  and  m. 
At  least  m  has  an  inverse  function  (i.e.,  the  decoding  function  m"’).  Thus, 
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m-*{53[m(p(e))]}  =  [e1 


(7) 


where  6^  denotes  the  state  transition  function  which  takes  the  symbol  s  g  1  as  input. 
Equation  (7)  readily  demonstrates  that  the  predictive  power  of  a  model  M  is  limited  to  the 
prediction  of  categories  of  system  states  (i.e.,  the  finite  state  machine  can  predict  an 
equivalent  class  [e'\  instead  of  a  single  state  e).  This  can  be  a  somewhat  global  predic¬ 
tion  (Holyoak,  1985).  Only  in  the  case  of  a  maximally  specific  partition,  that  is,  a  parti¬ 
tion  with  only  one  system  state  per  category,  this  would  lead  to  the  exact  prediction  of  the 
next  system  state.  It  seems  plausible  to  assume  that  this  limitation  of  the  representational 
structure  requires  the  (human)  cognitive  system  first  to  generate  the  model  state  corre¬ 
sponding  with  the  next  system  state  (i.e.,  goal  state  or  sub  goal  state),  and  then  to  chose 
an  input  symbol  to  transfer  the  current  model  state  into  the  next  model  state  (compare  the 
operator-difference  table  of  Newell  &  Simon’s  (1972)  General  Problem  Solver  (see  also 
Chamiak  &  McDermott,  1985)). 


Empirical  consequences 

To  derive  empirical  consequences  fi'om  this  abstract  characterization  of  the  representa¬ 
tional  structure  implied  by  Holland  et  al.’s  (1986)  notion  of  a  mental  model,  two  auxiliary 
assumptions  have  to  be  specified.  First,  participants  are  able  to  verbalize  the  symbols  s  e 
I  of  M,  which  stand  for  the  internal  representations  of  the  input  actions  performed  by  a 
user.  Second,  the  output  symbols  o  e  O  of  M,  representing  the  system’s  actions,  may  be 
made  observable.  Then,  the  productions 

^  iflt>  St)  ot+i  (8) 

contain  two  sets  of  observable  entities,  viz.,  the  input  symbols  s  e  I,  and  the  output  sym¬ 
bols  o  e  O,  as  well  as  one  set  of  non-directly  observable  entities  (i.e.,  the  internal  states  q 
€  Q).  However,  by  substituting  (3)  in  (8),  it  is  possible  to  obtain 

X(P{ei),si)^ot^l  (9) 

where  e/  refers  to  the  outcome  e  that  realizes  an  event  q  at  moment  t.  In  Equation  (9)  all 
terms  represent  observable  events. 

Equation  (9)  implies  that  all  factors  which  connect  st  and  o/+ j  are  explicitly  in¬ 
cluded  in  qp  It  is  appropriate  to  map  this  conjecture  into  a  probability  fi-amework  by  in¬ 
terpreting  St  and  /  as  events  in  two  finite  sample  spaces  I  and  O,  respectively.  Fur¬ 
ther,  the  set  of  states  E  can  be  considered  a  sample  space  partitioned  by  P  into  a  number 
of  equivalence  classes  [e],  each  of  which  corresponds  with  a  particular  model  state.  Thus, 
an  outcome  e  e  [e]  is  said  to  realize  event  q.  In  the  next  section  we  present  a  method  to 
test  a  hypothesis  involving  the  partitioning  of  a  set  of  environment  states  into  equivalence 
classes  that  may  be  identified  as  states  of  the  model. 
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Model  predictions 

Consider  slIxJxK  cross  classification  of  states  qj,  input  symbols  sj,j  =  1, 

nj,  and  output  symbols  ok,k=  1,  njt-  Assuming  the  events  sj  and  ok  are  directly 
observable,  the  event  qi  is  realized  by  an  outcome  e  €  [e/],  based  on  a  partitioning  Pk  of 
the  set  of  states  E.  The  relationship  among  sj  and  ok  is  supposed  to  be  mediated  by  the 
state  variable  qi,  such  that  X{qi,  sj)  ->  o^.  That  is,  for  every  combination  of  a  state  and  an 
input  signal,  one  and  only  one  output  signal  may  be  observed,  although  a  given  output 
signal  can  be  associated  with  more  than  one  combination  of  state  and  input  signal. 

Let  P  (qi)  represent  the  unconditional  probability  of  being  in  state  qj.  That  is, 
given  the  set  of  all  conceivable  observations  for  a  participant,  P  (qO  represents  the  prob¬ 
ability  for  a  randomly  selected  observation  of  the  system  while  being  in  state  qj.  Similar 
unconditional  probabilities  can  be  defined  for  sj  and  ok-  Write  P  (qi,  sj)  for  the  probabil¬ 
ity  of  the  joint  event  of  being  in  state  qi  and  receiving  input  signal  sj,  and  P  (qi,  sj,  ok)  for 
the  probability  of  the  joint  event  of  being  in  state  qi,  receiving  input  signal  sj,  and  pro¬ 
ducing  an  ou^ut  signal  ojfc.  Each  cell  in  the  7  x  J  x  iST  cross-classification  represents  the 
fi-equency  of  occurrence  of  a  joint  event  (qi,  sj,  o^),  which  can  be  easily  converted  into  a 
probability.  In  general,  the  following  equality  holds  for  this  probability:  P  (qi  Sj,  ok)  = 
P  (Pk  I  Sj,  qi)  P  (qi,  Sj).  Similarly,  P  (qi,  sj)  =  P  (qi  I  sj)  P  (sj)  =  P(sj  I  qi)  P(qi).  Thus  we 
get  the  basic  eqiiality: 


P  (Pi’  Sj,  o0  —  P  (ok  I  Sj,  qi)  P  (qi  I  Sj)  P  (sj)  (10) 

Therefore,  a  finite  state  model  defines  a  pattern  of  expected  fi'equencies  in  the  7  x  J  x  AT 
cross-classification  table,  which  can  be  easily  calculated  from  the  marginals.  As  for 
probabilistic  finite  automata  this  poses  no  specific  problems  for  model  tests  using  a  chi- 
square  based  statistic  (i.e.,  likelihood  ratio).  Note,  however,  that  the  definition  of  the 
DFA  implies  P  (ok  I  sj,  qi)  =  0  or  1.  The  dependency  of  ok  on  qi  and  Sj  can  be 
incorporated  in  the  notation  by  writing  i  ’j  ’  in  stead  of  k,  and  writing  P  (oij  ’  I  Sj,  qi)  =  1  if 
i'  =  i  and  j  ’  =  j,  and  zero  otherwise.  From  this  it  follows  that  P  (qi,  sj,  o0  -  P  (qi  1  sj)  P 
(sj),  or  equals  zero. 

Statistics  are  not  required  in  order  to  test  the  deterministic  model.  When  a  purely 
deterministic  model  predicts  one  (or  zero)  for  a  particular  cell  in  a  cross-classification 
table,  then  every  deviation  from  this  predicted  probability  implies  that  the  model  should 
be  rejected.  Thus,  visual  inspection  of  the  table(s)  is  sufficient  to  assess  whether  the 
hypothesis  is  confirmed  or  should  be  rejected.  However,  if  model  deviations  are  observed, 
it  may  be  useful  to  characterize  the  deviations  fi'om  the  model  by  measures  that  reflect  how 
well  predictions  fi'om  the  model  conform  to  the  observations.  The  finite-state  machine 
describes  essentially  a  functional,  asymmetric  relationship  between  a  combination  of  input 
signal  and  state  on  the  one  hand  and  an  output  signal  on  the  other.  Thus,  the  measure  of  fit 
should  allow  for  a  characterization  of  such  an  asymmetric  relationship.  One  such  measure 
is  the  lambda  statistic,  proposed  by  Goodman  and  Kruskal  (1979).  This  statistic  quantifies 
the  reduction  in  uncertainty  in  fire  prediction  of  one  category  firom  the  other  as  a  function  of 
their  association.  A  disadvantage  of  this  measure  is  that  it  only  takes  into  account  the 
largest  probability  in  a  row  (or  column).  Some  of  the  other  measures  are  based  on  the 
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concept  of  entropy  as  developed  within  Information  Theory  (see  e.g.,  Krippendorff,  1986). 
These  measures  are  more  closely  related  to  notions  familiar  from  the  multivariate  analysis 
of  numerical  variables,  such  as  the  variance  and  proportion  of  explained  variance  for  a 
dependent  variable,  and  (semi)partial  correlations  or  covariances.  These  measures  are  now 
defined. 

An  information  theoretic  evaluation  of  model  coherence 

Entropy.  Let  the  number  of  categories  of  a  variable  Xht  denoted  by  nx  and  let  the 
relative  frequency  in  category  i  of  Xbe  pi  (0  <  /?/  <  1).  The  entropy  H()Q  of  the  variable  is 
then  defined  as 


H{X)  =  -Zipi\0i,pi  (11) 

Entropy  can  be  interpreted  as  a  measure  of  the  variability  of  a  numerical  or  nonnumeric 
distribution.  It  shares  with  the  variance  of  a  distribution  of  a  numerical  variable  the 
property  that  it  attains  its  maximum  value  if,  for  all  i,  pi  is  equal  to  \/n  (with  n  the  number 
of  categories  of  the  variable).  It  also  attains  its  minimum  value  if pj  is  equal  to  one  for  one  i 
and  consequently  zero  for  other  categories  (i.e.,  if  the  variable  has  only  one  category).  The 
maximum  and  minimum  values  are  equal  to  logj  n  and  0,  respectively  (0  logj  0  is  defined  as 
zero). 

The  logarithm  in  Equation  (11)  is  taken  to  the  base  two,  which  leads  to  an 
interesting  interpretation  of  H(X).  It  expresses  the  variability  in  X  in  terms  of  the  basic  unit 
of  measurement  within  Information  Theory:  the  average  number  of  binary  decisions,  or  bits 
of  information,  necessary  to  make  a  classification  within  a  system  of  categories  (Attneave, 
1959;  Krippendorf,  1986;  Shannon  &  Weaver,  1949).  For  example,  consider  a  user  of  a 
particular  device  (such  as  an  application  program)  who  chooses  an  input  action  s  from  a  set 
of  possible  input  actions  I,  then,  H(I)  expresses  the  average  amoxmt  of  imcertainty  vmder 
which  the  user  operates.  Note  that  actions  that  are  logically  possible,  but  never  actually 
chosen,  do  not  enter  the  measure  (i.e.,  actions  s  such  that  s  e  I,  but  with  ps  =  0).  To  this 
purpose  the  convention  0  logj  0  =  0  is  adopted  here.  The  entropy  measure  H(X)  can  be 
extended  to  the  joint  distribution  of  two  or  more  variables  as  H  (XYZ...)  =  -  S/  Zj  2^ ... 
Pijk...  log;  pijk..-  Analysis  of  the  joint  distributions  of  variables  is  at  the  heart  of 
Information  TTieory. 

Entropy  can  be  defined  not  orJy  for  marginal  and  joint  distributions,  but  also  for 
conditional  distributions.  For  a  given  value  x  of  the  conditional  entropy  of  Y,  H(Y  ]  x)  is 
defined  as  -  Hj  (pg  \  pi)  log2  (pjj  \  pi).  Note  that  this  definition  is  completely  analogous  to 
the  entropy  of  the  marginal  distribution.  The  only  difference  is  that  the  marginal  probabili¬ 
ties  Pi  are  replaced  by  the  conditional  probabilities  pij\  pf.  The  average  conditional  entropy 
of  Y,  given  X,  is  a  weighted  sum  of  tiie  conditional  entropies  for  given  values  of  X,  with 
weights  equal  to  pi,  the  proportion  of  observations  for  the  particular  value  ofXon  which  the 
conditioning  has  occurred.  Thus,  H(Y  \X)  =  -  ’Ll  pi  Ij  (py  \  pi)  logj  (pij  \pi)  =  -  2/  ILj  py 
^og^ipijlPi). 
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Information  Transmission.  The  predictability  of  output  symbols  o  s  O  from  the  set 
of  input  symbols  j  6  /  is  in  Information  Theory  defined  as  information  transmission.  Til, 
O).  The  amount  of  information  transmission  can  be  expressed  in  several  different  ways 
that  are  mathematically  equivalent  (Krippendorff,  1986).  We  will  discuss  two  concep¬ 
tions  of  information  transmission.  The  first  conception  defines  information  transmission, 
TQ,  O),  as  the  difference  between  the  maximum  entropy  and  the  observed  entropy. 
Hence: 


Til,  O)  =  Hil)  +  HiO)  -  HilO)  (12) 

A  participant  endowed  with  a  perfectly  valid  mental  model  of  a  device  that  can  be 
characterized  as  a  DFA  (i.e.,  with  a  mental  model  that  is  a  DFA)  would  render  an 
observed  entropy  that  is  equal  to  zero  (i.e.,  HilO)  =  0),  that  is,  per  row  (or  column, 
whatever  is  appropriate),  /?;  =  1  for  one  cell  and  zero  for  all  others,  in  which  case  both  1 
logj  1=0  and  by  convention  0  logj  0  =  0.  In  the  case  of  a  total  lack  of  predictability  the 
observed  entropy  would  equal  its  maximum  value,  HilO)  =  Hil)  +  HiO).  That  is,  the 
observed  entropy  equals  the  sum  of  the  marginal  entropies  Hil)  and  HiO),  respectively. 

Notice  that  participants  can  not  only  differ  in  the  validity  of  the  model  they  are 
using,  that  is,  the  extent  to  which  the  observed  entropy  approaches  zero,  but  also  in  the 
way  they  use  the  model.  Participants  can  differ  in  preference  for  certain  input  actions  by 
which  they  act  upon  the  device.  For  example,  recall  the  previously  discussed  LOGO 
‘Turtle  World’.  Let  us  assume  that  the  turtle  is  facing  “north”,  then  a  vertical  line  with  a 
length  of  100  units  can  be  drawn  southward  in  several  ways:  (a)  by  a  single  command 
(i.e.,  BACK  100),  and  (b)  by  a  series  of  commands  (i.e.,  [RIGHT  180  FORWARD  100] 
or  [LEFT  1 80  FORWARD  100]).  Research  shows  that  children  have  a  preference  for  the 
latter  way  of  drawing  the  line  (Campbell,  Fein,  Scholnick,  Schwarts,  &  Frank,  1986;  Fay 
&  Mayer,  1987).  As  a  result  the  observed  variety  of  input  symbols,  Hil),  will  be  lower 
than  can  be  expected  on  the  basis  of  a  uniform  distribution  of  probabilities,  since  the  use 
of  one  particular  symbol  (  i.e.,  BACK)  is  systematically  avoided.  Notice  also  that 
reduction  in  observed  variety  in  /  does  not  necessarily  imply  a  similar  reduction  in  O. 
This  may  even  occur  in  the  case  of  a  perfectly  deterministic  mental  model  (as  the 
example  shows). 

This  conceptualization  of  predictability  as  a  (general)  reduction  of  uncertainty  is 
clear  and  straightforward  as  a  first  approach,  but  it  does  not  provide  precise  information 
about  the  contribution  of  each  individual  input  symbol  to  tiie  observed  entropy.  The 
second  conception  of  information  transmission  is  based  on  the  notion  that  knowing  about 
I  may  reduce  the  uncertainty  about  O.  Thus,  HiO  \  I)  <  HiO)  with  information 
transmission  defined  as 

Til,0)  =  HiO)-HiO\I)  (13) 

where  HiO  1 1)  denotes  the  entropy  in  O  given  I.  In  information  theoretical  terms  HiO  \  1) 
represents  the  noise  produced  by  the  input  symbols.  HiO  1 1)  gives  the  average  entropy  of 
O  over  each  observed  input  symbol,  that  is,  a  weighted  sum  of  entropies  associated  with 
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each  row  of  the  matrix  (i.e.,  H  (O  \  /))•  Thus,  H(P  |  7)  =  S,  p(i).H(0  \  i)),  where  p(i) 
denotes  the  relative  frequency  of  symbol  i  (or  row  /).  The  information  transmission 
statistic  can  be  straightforwardly  extended  to  three-way  classifications.  For  example,  T  (I, 
0\Q)  =  H(P\Q)-H(0\  IQ),  where  the  dependence  of  O  on  the  joint  occurrence  of  1  and 
Q  is  modeled  (see  Part  H). 

Proportional  Reduction  in  Uncertainty  {PRU).  The  association  between  categories 
in  a  two-way  classification  is  conventionally  expressed  as 

tIh  iOJ)  =  T{I,0)  /  Hip)  =  [HiO)  -  HiO  \  1)]  /  HiO)  (14) 

In  a  three-way  classification  the  additional  factor  leads  to  the  following:  t|h  (O  1  IQ)  =  T  (IQ, 
O)  I  HiO)  =  [H  (O)  -HiO\  IQ)]  /  H  (0).  The  interpretation  of  tih  (O  1  IQ)  is  analogous  to 
the  partial  correlation  coefficient  in  continuous-variable  statistics. 

The  observed  probabilities  in  a  cross-classification  do  not  in  general  coincide  with 
their  population  values.  Hence  measures  of  transmitted  information  are  partly  determined 
by  sampling  fluctuations  (i.e.,  they  are  generally  biased).  For  a  two-dimensional  table  the 
observed  measure  may  be  larger  than  zero  whereas  it’s  true  population  value  is  not.  The 
measure  of  transmitted  information  for  a  two-dimensional  table  is  a  simple  function  of  the 
likelihood-ratio  statistic  for  the  chi-square  test  of  independence  of  the  variables  that  form 
the  cross-classification  (i.e.,  T  (/,  0)  =  &  /  2n,  where  n  is  the  total  number  of  observations). 
Thus,  the  statistic  can  be  used  to  test  whether  T  (7,  O)  is  different  from  zero 
(alternatively,  the  asymptotically  equivalent  Pearson  statistic  can  be  used).  If  G^  is  not 
statistically  significant,  the  variables  are  assumed  independent  and  T  (7,  O)  is  assumed  to  be 
zero.  The  G^  statistic  is  often  used  to  compare  the  fit  of  loglinear  models  that  are  fitted  to  a 
cross-classification  by  maximum  likelihood.  For  a  two-dimensional  table,  G^  compares  the 
fit  of  the  independence  model  versus  a  model  of  (unrestricted)  dependence  of  the  variables. 
The  latter  model  fits  the  data  perfectly,  because  the  observed  and  predicted  cell  frequencies 
in  the  table  are  identical  (it  is  a  so-called  saturated  model).  If  G^  is  not  significant,  the 
variables  are  presumably  independent  In  the  independence  model  the  cell  probabilities  are 
fitted  as  the  product  of  the  marginal  probabilities  (i.e.,  p  (s/,  op  =p  (si)  x  p  iop).  If  the 
value  of  p  (s/,  op  under  independence  is  substituted  in  T  (7,  O)  the  latter  becomes  zero 
(imder  the  dependence  model,  the  value  of  T  (I,  O)  is  the  one  computed  from  the  observed 
probabilities).  Thus,  setting  7(1, 0)  to  zero  if  G^  is  not  significant  can  also  be  interpreted  as 
using  test  statistics  to  fit  an  appropriate  model  (e.g.,  a  loglinear  model  is  first  fitted  to  the 
table)  and  subsequently  computing  measures  of  transmitted  information  from  the 
probabilities  generated  by  the  model.  For  a  three  dimensional  table,  where  the  third 
dimension  is  formed  by  the  states,  the  important  models  to  consider  are:  (1)  a  model  of 
complete  independence,  (2)  a  model  of  dependence  of  7  and  O,  where  the  dependence  is  the 
same  for  the  different  states,  and  (3)  a  model  where  the  dependence  of  7  and  O  is  different 
for  different  states.  In  practice,  model  fitting  and  testing  may  be  complicated  by  observed 
frequencies  of  zero  in  the  table.  If  zero  frequencies  occur,  it  may  not  be  possible  to  fit  some 
models  or  some  estimated  frequencies  may  become  zero,  in  which  case  the  degrees  of 
freedom  need  to  be  adjusted.  A  second  problem  that  in  practice  can  occur  is  that  the 
generated  frequencies  are  too  small  for  the  chi-squared  approximation  to  be  valid. 
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PART  II 


The  MAP  test:  A  spatial  reasoning  task 

The  MAP  test  is  a  spatial  reasoning  task  which  requires  children  to  move  an  object  from  an 
initial  state  (IS)  to  a  goal  state  (GS)  on  a  schematic  diagram  of  a  spatial  structure.  There 
are  two  versions  of  the  task.  One  version  suggests  a  city  plan  with  recognizable  markers 
(i.e.,  shops  and  churches)  and  a  recognizable  object  (i.e.,  a  bus).  The  other  version  has 
abstract  markers  (such  as  triangles  and  squares),  and  a  more  abstract  object  to  be  moved 
(i.e.,  a  pawn)  (see  Figme  1).  The  hypothesis  to  be  tested  is  that  the  different  experimental 
conditions  evoke  basically  the  same  input  alphabets  (operators:  forward,  back,  right,  and 
left),  but  otherwise  different  finite-state  machines  (i.e.,  mental  models).  This  paper  does 
not  include  a  complete  report  of  the  experimental  results.  For  this  the  interested  reader  is 
referred  to  Ippel  and  Beem  (1997). 

The  MAP  test  consists  of  32  items;  8  items  are  introductory  items,  12  items  in 
which  IS  and  GS  have  identical  T-coordinates,  but  an  interrupted  path  from  to  Xq^,  and 
12  items  in  which  both  the  X-  and  7-coordinates  of  IS  and  GS  differ.  The  7  differences 
between  IS  and  GS  are  systematically  manipulated  with  y’s  of  +5,  +3,  -3,  and  -5,  where  y 
=  Yqs  -  Yjs-  For  12  items  GS  is  located  at  the  left  side  of  the  diagram  and  in  the 
remaining  items  GS  is  located  at  the  right  side  of  the  diagram.  The  first  8  introductory 
items  involve  easy  problems,  that  is,  these  items  have  uninterrupted  paths  between  IS  and 
GS.  These  items  were  not  used  in  our  analysis. 

Participants.  Participants  were  48  second-grade  (mean  age  7.75  years,  23 
female)  and  52  third-grade  students  (mean  age  8.75  years,  22  female)  from  elementary 
schools  near  Leiden  (The  Netherlands). 


Figure  1. 

Different  diagram  representing  an  identical  spatial  structure  in  the  two  experimetal 

conditions. 
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Procedure.  Students  were  randomly  assigned  to  one  of  the  two  object  conditions, 
grade  and  gender  were  equally  distributed  over  the  conditions.  Three  experimenters  were 
involved  in  administering  the  tests.  The  tests  were  individually  administered  in  separate 
offices  in  a  quiet  part  of  the  school.  Test  time  varied  between  30  and  45  minutes.  The 
experimenter  instructed  the  students:  (1)  to  place  the  object  (either  a  pawn  or  a  bus)  in  its 
initial  position;  (2)  to  move  the  object  from  its  initial  state  to  frie  goal  state  while 
following  the  shortest  possible  route;  (3)  to  talk  aloud  while  moving  the  object.  Students 
were  asked  to  describe  the  moves  of  the  objects  such  that  (a)  in  the  pawn  condition,  the 
experimenter  could  perform  the  same  move,  even  though  he  or  she  could  not  see  the 
actual  move  being  performed,  and  (b)  in  the  bus  condition,  an  imaginary  bus  driver  could 
follow  up  the  instructions  and  drive  the  bus  to  its  goal.  The  experimenter  scored  the 
actual  physical  state  of  the  object  after  a  command  had  been  executed.  Verbal  protocols 
were  audio  taped  and  later  typed  out.  Two  raters  independently  scored  each  protocol. 
The  raters  scored  each  statement  according  to  a  predefined  input  category  system. 
Divergent  scorings  were  discussed  until  agreement  was  reached  in  a  separate  session  and 
scored  accordingly. 


The  mental  model  hypothesis 

For  each  of  the  two  conditions  a  detailed  hypothesis  concerning  the  mental  model  to  be 
evoked  by  the  experimental  conditions  can  be  formulated.  Figure  2  shows  the  state 
transition  diagrams  representing  the  mental  models  for  the  bus  and  the  pawn.  The  blocks 
represent  the  states,  the  arrows  represent  the  state  transitions,  and  the  letters  next  to  the 
arrows  denote  the  input  symbols,  viz.,  T,  R,  F  and  B  denote  LEFT,  RIGHT,  FORWARD 
and  BACK,  respectively.  The  letters  N,  E,  S  and  W,  in  the  state  transition  diagram 
representing  the  mental  model  of  a  bus,  refer  to  names  of  the  four  states  qj,  ...,q4  (i.e., 
north,  east,  south  and  west),  respectively. 


Figure  2. 

The  state  transition  functions  represented  as  a  state  transition  diagrams. 
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Bus  model.  The  bus  starts  out  in  state  qi,  that  is,  facing  ‘north’.  In  this  state  the  next 
output  function  comprises  four  productions  which  map  an  element  from  the  set  of  input 
symbols  I  =  {/%  B,  L,  R}  onto  the  set  of  ou^ut  symbols  O  =  {y-,  y+,  x-,  x+,  r-,  r+}.  The 
symbols  y-,  >H-,  x-,  and  x+  refer  to  movements  along  the  Y-  and  X-axis  and  the  symbols  r+ 
and  r-  refer  to  a  clockwise  and  a  counter  clockwise  rotation  of  90  degrees,  respectively. 
These  productions  are  y-,  X(b)  ->  X*-,  A.(l)  ->  f-,  and  ->  H-.  Figure  2  indicates 
that  the  choice  of  either  F  or  B  effects  the  system’s  physical  state,  that  is,  F  and  B  do 
effect  the  location  of  the  bus.  However,  these  input  symbols  would  not  change  the  state 
of  the  mental  model.  Thus,  the  same  set  of  productions  holds  for  the  next  input.  If  one  of 
the  input  symbols  L  and  R  is  chosen,  not  only  the  system’s  physical  state  will  be  affected, 
but  the  mental  model  of  the  bus  also  will  enter  a  new  state  (either  q4  or  q2).  That  is,  the 
bus  turns  either  west  or  east  and  enters  either  q4  or  q2,  respectively.  For  example,  let  us 
assiune  that  the  bus  is  facing  north  and  that  the  student  instructs  the  imaginary  bus  driver 
to  take  a  turn  to  the  right,  such  that  the  bus  now  faces  east.  At  the  same  time,  a  new  next 
output  function  holds  with  the  productions:  X(f)  ->•  x+,  X,(b)  ->•  x-,  X(l)  ->  r-,  and  >.(r)  ->  r+. 
Table  1  summarizes  the  four  different  next  output  functions  that  define  the  four  states  of 
the  mental  model  of  the  bus. 

Table  1.  Next  output  functions  of  each  of  the  four  states  of  the  mental  model  for  the 

bus. 


A 

^7 

Q2 

^3 

q4 

F 

y- 

x+ 

y+ 

X- 

B 

y+ 

X-“ 

y- 

x+ 

L 

r- 

r- 

r- 

r- 

R 

rf 

r+ 

T+ 

r+ 

Pawn  modeL  Objects  such  as  a  ball,  or  a  pawn,  do  not  have  intrinsic  perceptual  features 
such  as  a  front  side,  or  backside,  and  therefore,  no  left  or  right  side.  Sentences  such  as 
“move  the  pawn  forward”  and  “turn  the  pawn  to  the  right”  cannot  be  meaningfully  related 
to  spatial  features  of  the  object  itself,  and  therefore,  most  likely,  will  be  interpreted  in 
relation  to  the  direction  the  student  is  facing.  As  the  student’s  position  does  not  change 
during  the  test  session  the  same  function  is  expected  to  map  the  set  of  input  symbols  /  = 
(F,  B,L,R}  onto  the  set  of  ouQ)ut  symbols  0=  {y-,  y+,  x-,  x+},  that  is,  3^- ,  X.(b)  -> 
y+,  X(L)  ->  r-,  and  X(r)  r+.  Therefore,  the  mental  model  of  a  pawn  is  a  one  state  finite- 
state  machine,  an  example  of  a  so-called  trivial  machine.  In  this  paper  we  will  not  be 
concerned  with  the  pawn  condition  (see  Ippel  and  Beem,  1997). 
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Empirical  evaluation 

The  first  aspect  of  a  mental  model  as  a  finite-state  machine  to  be  considered  concerns  the 
input  alphabet  (I)  and  output  alphabet  (O)  that  the  participant  uses  to  describe  the 
interaction  with  tiie  artifact  under  investigation.  These  alphabets  may  differ  jfrom  the 
alphabets  that  define  the  proposed  mental  model.  The  latter  alphabets  will  be  referred  to 
as  I*  and  O*,  respectively.  Participants  can  deviate  from  model  specifications  in  two 
distinct  ways.  First,  the  participant  uses  less  input  and/or  output  symbols  than  the 
hypothetical  model  assumes,  that  is,  /  c  /*  and/or  0  c  O*.  This  poses  no  analytical 
problem,  because  entropy  (i.e.,  H  (I)  and  H(0))  is  a  measure  of  observational  variety  and 
imobserved  possibilities  do  not  enter  into  the  measure  (Krippendorff,  1986).  Of  course, 
this  in  turn  suggests  that  the  participant  does  not  utilize  the  artifact’s  possibilities  to  its 
fullest  extent. 

More  problematic  is  the  possible  event  that  1  and  I*  and/or  0  and  0*  only 
partially  overlap.  That  is,  the  participant  uses  input  symbols  and/or  output  symbols, 
which  are  not  member  of  the  finite  sets  /*  and  O*  that  define  the  proposed  finite-state 
machine.  Partial  overlap  represents  a  more  serious  misconception  of  the  functioning  of 
the  artifact.  Notice  that  non-overlap  of  0  and  O*  is  more  serious  than  non-overlap  of  / 
and  /*,  because  the  latter  dissimilarity  may  result  from  differences  in  labeling  the  input 
symbols. 

The  second  aspect  to  be  evaluated  relates  to  the  contribution  of  the  mental  model 
in  the  predictability,  or  control,  of  the  behavior  of  the  device.  Information  Theory 
expresses  this  predictability,  or  control,  aspect  as  information  transmission.  In  Part  II  we 
will  adopt  the  second  approach  that  defines  information  transmission  as 

T{I,0)  =  HiO)-H{p\I)  (15) 

where  H  {O  \  I)  denotes  the  entropy  in  O  given  7.  The  question  of  whether  or  not  the 
participant’s  mental  model  encompasses  a  postulated  set  of  model  states  may  be 
investigated  by  testing  the  statistical  significance  of  the  increase  in  proportional  reduction 
of  error  variance  when  the  postulated  set  of  model  states  is  included  in  the  analysis.  This 
analysis  requires  the  comparison  of  a  two-way  classification  table  (see  above)  with  a 
three-way  table.  Several  approaches  for  analyzing  three-way  classification  tables  are 
possible.  We  choose  an  approach  discussed  by  Wickens  (1989)  in  which  the  transmission 
of  information  between  two  factors  is  conditioned  on  levels  of  the  third.  More 
specifically,  we  will  consider  the  mapping  of  input  symbols  ^  e  /  on  output  symbols  o  e 
O  given  the  model  states  q  s  Q.  Let  Q  be  restricted  to  level  then  the  transmission 
between  I  and  O  is: 


r (/,  O I  ?/)  =  H(.0\qi)-H(0\In qt)  (16) 

Thus,  Til,  O  I  qi)  is  a  two-way  transmission  statistic  for  each  level  ^/.  The  conditional 
information  transmission  7(7,  O  1  0  is  the  weighted  mean  over  these  two-way  statistics, 
7(7,  0\Q)  =  E,-  p{q, ).!(!,  O  \  q{).  In  the  passages  that  follow,  data  of  two  participants 
(Appendix  2)  vvdll  be  discussed  to  illustrate  the  analytical  possibilities  of  the  approach. 
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Participant  #75.  This  participant  demonstrates  an  awareness  of  the  complete  set 
of  input  and  output  symbols  that  are  part  of  the  definition  of  the  mental  model  as  a  finite- 
state  machine.  Table  2  presents  the  data  of  participant  #75  as  a  two-way  classification 
table  of  input  symbols  by  ou^ut  symbols,  without  considering  the  different  model  states. 
It  displays  the  amount  of  noise  associated  with  each  input  symbol  and  its  contribution  to 
the  average  amount  of  entropy  that  remains  when  the  input  symbols  are  known.  The 
columns  2  to  7  contain  the  conditional  probabilities,  p{P\i)  and  column  9  the  entropies 
HiO  I  /)  associated  with  each  input  symbol.  The  final  colunm  displays  the  contribution  of 
each  symbol  to  the  average  observed  entropy  in  O,  that  is,  HiQ  1  I).  Each  symbol’s 
contribution  is  weighted  according  to  its  relative  firequency  (i.e.,  p(i).H(0  \  /)).  The  total 
entropy  in  O  is  equal  to  2.57,  and  H(0  j  I)  equals  1.64.  Consequently,  T(I,  O)  =  .93.  The 
proportional  reduction  of  error  according  to  Equation  (13)  is  .36.  Note  that  input  symbol 
MoForw  has  the  largest  contribution  to  the  average  amoimt  of  noise  produced  by  the 
input  symbols.  As  can  be  inferred  fi'om  Appendix  2  this  source  of  noise  disappears  when 
different  model  states  are  taken  into  account  (see  also  Table  3).  Table  3  demonstrates 
that  the  prediction  of  the  output  symbols  was  substantially  improved  by  taking  the 
postulated  model  states  into  consideration.  For  each  of  the  states  of  the  mental  model  the 
participant  correctly  maps  the  input  symbols  onto  the  ouq)ut  symbols,  except  for  some 
noise  in  the  mapping  of  TuLeft  onto  the  ouqjut  symbols  r-  and  r+,  while  participant’s 
mental  model  is  in  state  “north”.  This  noise  represents  the  familiar  phenomenon  of  left- 
right  confusion  found  in  young  children.  Note  that  this  analysis  does  not  quantify  the 
amount  of  equivocation  of  TuLeft  and  TuRight  in  model  state  “west”  (compare  Table  3 
with  participant  #75  data  in  Appendix  2).  In  fact,  the  analysis  shows  that  left-right 
confusion  can  take  two  forms,  either  it  represents  uncertainty  about  the  actions  attached 
to  an  input  symbol,  or  imcertainty  about  the  input  symbols  attached  to  an  action.  Table  4 
presents  some  conditional  information  transmission  statistics.  In  summary,  the 
conditional  uncertainty  in  0,H(P\  Q),  amounts  to  1.022.  The  conditional  information 
transmission  T  (I,  O  \  Q)  equals  .895.  The  proportional  reduction  in  error  now  equals 
.876. 

Observe  now  that  T  (I,  O)  and  thus  the  proportional  reduction  in  error  in  Table  2 
would  be  zero  if  the  probabilities  in  every  row  would  be  the  same  as  the  marginal 


Table  2.  Association  of  input  symbols  and  output  symbols  without  considering  dif¬ 
ferent  model  states  (data  participant  #75). 


input 

symbols 

y- 

output  symbols 

y+  X-  x+ 

r- 

r+ 

P(i) 

H(Oli) 

p{l).H(0|i) 

MoForw 

0.277 

0.266 

0.213 

0.245 

0 

0 

0.686 

1.993 

1.367 

MoBack 

0 

1 

0 

0 

0 

0.022 

0 

0 

TuLeft 

0 

0 

0 

0 

0.529 

0.471 

0.248 

0.998 

0.248 

TuRight 

0 

0 

0 

0 

0.167 

0.833 

0.044 

0.650 

0.028 

p(o): 

0.177 

0.19 

0.136 

0.156 

0.129 

0.143 

H(0|l)= 

1.643 
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probabilities  This  condition  defines  independence  of  I  and  O.  Deviations  from 

this  condition  define  dependence  or  'interaction'  of  I  and  O.  In  general,  an  interaction 
exists  if  p  (si,  ofc)  is  not  the  same  as  p  (sj)  x  p  (o^)  for  at  least  one  cell  in  the  table. 
Because  the  observed  probabilities  are  subject  to  sampling  fluctuations,  this  condition 
may  be  tested  formally  by  a  statistical  test  for  independence.  For  a  two-way  contingency 
table  such  as  table  2,  the  well-known  Pearson  statistic  or  the  loglikelihood  ratio 
statistic  may  be  used.  Both  have  approximately  a  chi-squared  distribution  in  large 
samples.  More  precisely,  the  validity  of  the  chi-squared  approximation  depends  on  the 
expected  frequencies  under  the  null  hypothesis  of  independence.  As  observed  earlier, 
such  tests  of  Ae  statistical  significance  can  be  interpreted  as  fitting  models  and  testing  the 
differences  of  fit  between  models.  Loglinear  models  for  contingency  tables  are  often 
applied  for  this  purpose.  For  the  two-way  contingency  table,  the  model  of  independence 
is  M  +  1  +  O,  where  M  is  a  "general  mean"  and  7  and  O  are  the  effects  of  input  and 
output.  The  model  A7  +  /+  0  +  /*0isthe  model  for  dependence  or  interaction  of  I  and 
O.  This  is  the  so-called  saturated  model,  which  always  fits  the  data  perfectly  (i.e.,  the 
expected  frequencies  generated  by  the  model  are  the  same  as  the  observed  frequencies). 
The  loglikelihood  ratio  statistic  for  the 


Table  3.  Contributions  to  the  entropy  of  input  symbol  per  model  state  (data: 

participant  #75). 


model 

states: 

input 

symbols: 

P(i) 

H(0|l) 

p(l).H(0| 

i) 

north 

MoForw 

0.500 

0 

0 

MoBack 

0.058 

0 

0 

TuLeft 

0.346 

0.964 

0.334 

TuRight 

0.096 

0 

0 

east 

MoForw 

0.719 

0 

0 

TuLefl 

0.281 

0 

0 

south 

MoForw 

1.000 

0 

0 

west 

MoForw 

0.714 

0 

0 

TuLeft 

0.250 

0 

0 

TuRight 

0.036 

0 

0 

independence  model  can  thus  be  interpreted  as  comparing  the  fit  of  this  model  with  the 
model  that  includes  the  interaction.  Tlie  Pearson  statistic  compares  the  observed  and 
expected  frequencies.  Loglinear  models  generalize  to  higher-dimensional  contingency 
tables.  The  saturated  model  for  table  4  is  M+I+0  +  Q  +  I*0  +  I*Q  +  0*Q  +  I* 
O  *  Q.  The  last  term  signifies  that  the  interactions  7  *  O  are  different  for  different  states, 
which  effectively  means  that  inclusion  of  the  state  effect  increases  the  proportional 
reduction  in  error. 
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All  loglinear  models  possible  with  the  three  factors  I,  O  and  Q  were  fitted  to  the 
data  of  participant  #75.  Evaluating  the  fit  of  these  models  by  the  standard  Pearson  or 
loglikelihood  ratio  statistics  was  however  problematic,  because  many  expected 
frequencies  were  less  than  one,  which  probably  makes  the  chi-squared  approximation 
invalid.'*  Models  including  two  two-factor  interactions  generated  so  many  zero  expected 
frequencies  that  no  degrees  of  freedom  remained  for  evaluating  the  models.  However,  it 
seems  safe  to  conclude  that  the  no-interaction  model  (i.e.,  model  M+  7+0  +  0  does  not 
fit  the  data  of  participant  #75,  x^(80  df)  =  753.176.  This  statistics  compares  the  model 
with  the  saturated  model,  which  is  the  model  for  which  the  input-output  interaction  is 
different  for  different  states.  The  model  including  the  main  effects  and  input-output 
interaction  could  not  be  evaluated  because  too  many  expected  frequencies  were  zero. 


Table  4.  Conditional  information  transmission  statistics  (data  participant  #75). 


model 

states: 

P(q-) 

H(0|q.) 

p(q.).H(0|q.  ) 

H(0|lnq.) 

T(l,0|q.) 

p(q.).T(l,0|q.) 

q1 :  north 

0.380 

1.700 

0.645 

0.334 

1.366 

0.518 

q2:  east 

0.234 

0.857 

0.200 

0 

0.857 

0.200 

q3:  south 

0.182 

0 

0 

0 

0 

0 

q4:  west 

0.204 

0.863 

0.176 

0 

0.863 

0.176 

H(0|Q)  = 

1.022 

T(».0IQ)= 

0.895 

Finally,  the  data  provide  some  insight  into  a  redundancy  caused  by  the  mental 
model.  Recall  that  in  each  problem  the  bus  starts  out  facing  “north”.  The  problems  were 
designed  such  that  the  goal  state  required  the  bus  to  make  6  times  an  y+  translation 
(either  forward  or  back),  and  6  times  an  y-  translation  (idem).  The  remaining  12 
problems  had  an  initial  state  (IS)  and  a  goal  state  (GS)  with  identical  Y-coordinates,  but 
the  path  between  IS  and  GS  was  blocked  so  that  the  bus  initially  had  to  be  moved  across 
the  F-axis  (either  ay-  move  or  ay+  move).  Also,  in  12  problems  reaching  the  goal  state 
required  an  x-  translation  and  12  problems  required  an  x+  translation.  Since  the  bus  can 
only  move  forward  or  back,  this  implies  that  the  bus  first  must  make  a  turn  (r-  or  H-)  in 
order  to  be  able  to  move  along  the  Y-axis.  Further,  recall  that  the  instruction  urged  the 
students  to  take  the  shortest  route  from  IS  to  GS.  In  summary,  the  experimental 
conditions  were  designed  such  that  a  uniform  distribution  of  input  symbols  was  not 
hampered  by  extraneous  variables.  Redundancy  is  defined  as  the  difference  between  the 
entropy  of  a  imiform  distribution,  H(I)^^,  and  the  observed  entropy,  H  (I)  (Krippendorff, 
1986).  The  relative  frequencies  by  which  participant  #75  used  the  input  symbols  F,  B,  L, 
and  R  in  model  state  “north”  were  .500,  .057,  .346,  .096,  respectively.  Or  rather,  if  we 


There  is  evidence  that  Pearson's  more  closely  resembles  the  chi-squared 
distribution  in  sparse  tables  (see  e.g..  Read  &  Cressie,  1988).  Read  and  Cressie  also 
present  statistics  for  which  the  chi-squared  approximation  is  closer  to  the  distribution  in 
small  samples  with  small  expected  frequencies,  but  these  were  not  used  here. 
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correct  for  the  left-right  confusion  in  model  state  “north”  these  percentages  would  be 
.500,  .057,  .211,  and  .230,  respectively.  The  maximum  uncertainty,  H  (/)max,»  equals  2 
bits,  whereas  the  observed  entropy  (after  left-right  correction),  H  (7),  is  equal  to  1.697 
bits,  which  means  an  uncertainty  reduction  of  16.5  %.  A  plausible  explanation  for  this 
redimdancy  seems  to  be  the  notion  that  a  bus  cannot  easily  be  driven  backward. 
Therefore,  a  low  likelihood  of  choosing  input  symbol  B  seems  to  follow  from  the  mental 
model.  This  redundancy  may  be  an  indicator  of  the  strength  of  the  mental  model  and 
may  differ  across  participants. 


Table  5.  Two-way  cross-tabulation  of  input  and  output  symbols  (data:  participant 

#32). 


input  symbols 

y- 

y+ 

output  symbols 

X-  x+  r- 

not-0 

total 

MoForw 

13 

5 

2 

7 

27 

MoBack 

13 

13 

TuLeft 

2 

2 

TuRight 

2 

2 

not-l 

31 

31 

13 

13 

5 

2 

2 

2 

38 

75 

Participant  #32.  We  will  discuss  only  one  aspect  of  the  data  of  this  participant. 
Participant  #32  appears  to  use  an  input  and  an  output  alphabet  which  contain  the  postu¬ 
lated  alphabets  as  subsets  (see  data  participant  #32  in  Appendix  2).  Table  5  presents  a 
simplified  overview  of  the  mappings  of  the  input  symbols  to  the  output  symbols.  Verbal 
instructions  and  physical  moves  of  the  bus  that  could  not  be  categorized  as  elements  of 
the  postulated  input  and  output  alphabet  was  categorized  as  not-/ and  not-6>,  respectively. 
Table  5  clearly  shows  that  most  input  symbols  s  e  /map  onto  output  symbols  o  e  O.  All 
input  symbols  s  ^  I  map  onto  output  symbols  o  ^  O.  This  suggests  the  existence  of  a 
core  model  with  an  input  alphabet  and  output  alphabet  similar  to  the  postulated  alphabets. 
In  addition,  participant  #32  uses  symbols,  which  seem  to  represent  a  relatively  independ¬ 
ent  somewhat  degenerated  version*  of  the  core  model.  Figure  3  shows  the  conditional 
probabilities  of  the  occurrence  of  a  not-/  symbol  given  the  occurrence  of  a  particular 
model  state.  The  relationship  between  these  probabilities  and  the  model  states  can  be  ac¬ 
curately  described  (7?^  =  .93)  by  a  second  degree  polynomial,  suggesting  that  the  diffi¬ 
culty  the  participant  experienced  in  utilizing  the  core  mental  model  (i.e.,  more  or  less 
similar  to  the  postulated  model  of  bus  model)  is  a  function  of  the  absolute  difference  be¬ 
tween  the  participant’s  orientation  and  the  orientation  of  the  object  on  the  map. 


*  Degenerated  version  because  these  (input  and  output)  symbols  have  lost  the 
distinction  between  turn  and  move. 
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conditional  probabilities 


Figure  3. 

Conditional  probabilities  of /symbols  and  not-/ symbols  given  the  occurrence  of  a 

model  state. 
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General  Discussion 


In  this  paper  we  proposed  to  formally  characterize  mental  models  as  a  finite-state 
machine,  which  is  fully  described  by  its  functions  6  and  X.  The  motivation  to  provide 
such  a  formal  description  was  that  the  theory  of  finite-state  machines  defines  a  class  of 
mathematical  structures  with  well-known  properties.  Thus,  results  of  formal  analysis  can 
serve  as  a  basis  for  studying  cognitive  processes  and  cognitive  representations  of  interac¬ 
tive  environments  in  several  ways.  First,  this  formalism  makes  it  possible  to  achieve  a 
fully  parsimonious  description  of  any  particular  mental  model,  because  the  theory  of 
finite-state  machines  provides  systematic  ways  of  achieving  a  minimal  form  of  a  machine 
that  defines  a  given  set  of  input-output  mappings  (Denning  et  al.,  1978).  Second,  this 
formalism  can  also  be  used  as  a  tool  to  identify  factors  that  determine  the  complexity  of 
an  interactive  device  such  that  devices  with  different  complexity  can  be  designed  and 
experimentally  compared  (see  e.g.,  Ippel  and  Meulemans,  1997).  In  particular,  in  combi¬ 
nation  with  an  information  theoretical  analysis  the  information  load  of  an  interactive 
device  in  terms  of  information  to  be  transmitted  can  be  objectively  quantified  (i.e.,  the 
maximum  entropy  to  be  reduced  in  order  to  attain  control  over  a  device).  Mental  models 
can  be  evaluated  in  terms  of  the  reduction  in  information  transmission  load  they  produce. 
If  we  consider  the  example  presented  in  Part  II  of  this  paper,  it  becomes  clear  that  the 
mental  model  of  the  bus  induces  a  preference  in  participants  for  the  input  signal  F  over  B 
and  a  slight  preference  for  R  over  L.  The  redundancy  created  by  this  input  action  pattern 
in  turn  implies  a  reduction  in  information  transmission  load.  At  the  same  time,  however, 
it  requires  longer  input  strings  (i.e.,  more  input  actions)  to  achieve  the  same  goal.  To 
answer  the  question  as  to  how  beneficial  this  is  to  the  effective  control  of  devices,  further 
research  is  needed. 
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APPENDIX  1 


Let  P  :  Q  be  a  function  of  W  into  Q.  Let  R  be  the  relation  on  fF  defined  by  sRt  iff 

P(s)  =  P(t),  for  s  and  finW.  Then: 

A,  i?  is  an  equivalence  relation.  A  proof  of  this  proposition  can  be  foxind  in  Denning 
et  al.  (1978).  It  is  included  in  tins  Appendix  in  order  to  facilitate  the  reading  of 
proposition  B. 

B.  The  function  m:  WIR  0  is  a  one-to-one  function. 

Proof. 

A.  Since  P:W-^Qisa  function,  for  each  q  ^  Q,  let  Wq  be  the  set  of  states  in  W  that 
map  onto  q\ 


Wq^{x^W\q=P{x)} 

We  assume  that  P  is  an  everywhere  defined  function.  Thus,  each  x  e  IF  is  element  in  ex¬ 
actly  one  set  Wq  and  the  sets  Wq  form  a  partition  of  W.  The  corresponding  relation  R  is 
defined  by 


sRt  ifrP(j)  =  P(0 

Note  that  R  is  clearly  reflexive  and  symmetric.  Suppose  now  sRt  and  tRu  for  s,  t,  and  u  in 
W.  Then,  sRu  holds,  so  R  is  transitive,  and  therefore  P  is  an  equivalence  relation.  R  par¬ 
titions  W  into  equivalence  classes  usually  denoted  by  [w]: 

[w]  =  {xe  W\P{x)  =  Piw)} 

The  partition  of  W  consists  of  all  equivalence  classes  of  W  and  is  denoted  W/R.  WIR  is 
also  called  the  quotient  set  defined  on  IF  by  P.  This  establishes  that  a  function  deter¬ 
mines  an  equivalence  relation  on  its  domain. 

B.  Let  R  be  the  equivalence  relation  defined  on  a  set  W  and  let  WIR  be  the  corresponding 
quotient  set  (see  ad  a).  Let  p:  W  WIR  and  m:  WIR  Q  be  functions.  If  [w]  e  W/R, 
and  q  &  Q,  then  the  function  m:  WIR  ^  0  is  defined  by 

m  ([w])  =  q 

Since  m:  WIR  Q  is  a  function  7w([w])  has  a  single  value.  Also  [w\  =  Wq-{x&  W\q  = 
P(x)}  (see  ad  A).  Thus,  /w''([w])  is  also  a  function,  that  is,  a  function  from  Q  onto  WIR. 
Therefore,  the  function  m:  WIR  ->  0  is  a  one-to-one  function. 
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APPENDIX  2 


Participant  #75 


model 

states: 

input 

symbols: 

y- 

y+ 

output  symbols 

X-  x+ 

r- 

r+ 

not-0 

Total 

north 

MoForw 

13 

6 

19 

MoBack 

13 

13 

TuLeft 

1 

1 

TuRight 

2 

2 

not-l 

17 

17 

east 

MoForw 

2 

2 

not-l 

6 

6 

south 

not-l 

2 

2 

west 

MoForw 

5 

1 

6 

TuLeft 

1 

1 

not-l 

6 

6 

Total 

13 

13 

5  2 

2 

2 

38 

75 

26 


Participant  #32 


model 

input 

Output  symbols 

Total 

states: 

symbols: 

y- 

y+ 

X 

1 

r- 

r+ 

north 

MoForw 

26 

26 

MoBack 

3 

3 

TuLeft 

11 

7 

18 

TuRight 

5 

5 

east 

MoForw 

23 

23 

TuLeft 

9 

9 

south 

MoForw 

25 

25 

west  MoForw 

20 

20 

TuLeft 

7 

7 

TuRight 

1 

1 

Total 

26 

28 

20 

23 

19 

21 

137 

27 


